1. Introduction

The goal is to use the Washington Post police shootings and related (demographic data, state spending, homicides) datasets to try and determine whether we can provide any actionable suggestions to the US Justice Department on how to decrease the overall number of police shootings.

The analysis is split into three parts all of which try to answer these different questions:

  1. Overview of the variables in core dataset and state level analysis

    • Are certain races disproportionately affected?
    • What are the predominant circumstances and reasons police shootings occur?
    • What's the effect of other factors like victims age, gender, body camera usage etc.?
  2. State level analysis and clustering based on demographic data

    • What's the relationship between state level homicide rates and the number of police shootings?
    • What effect does spending on law enforcement have?
    • Could we cluster states using demographic and economic data to try and similar states with varying number of police shootings (that way policies, training standard, organization structure used in those state could possibly be implemented in worse performing state)?
  3. Building a state level statistical model which would explain the effect various socio-demographic variables have on the number of police shootings.

    • Can we find states that perform better or worse than they expected to (based on their socio-demographic profile)?
      • Can other similar states adopt some of the policies, training standards from the best performing states?

2. Overview of the dataset

The goal of this part is to analyze the main shootings dataset to try and find any patterns between different races, age groups genders etc.

The purpose of which is to determine what proportion of variance could be explained by immutable features (which law enforcement agencies can't directly affect) and whether the unexplained variance could be related to racial and other prejudices.

2.1. Chronological Distribution of Police Shootings

In total there were 8635 recorded police shotings between 2015-01-02 and 2023-07-22, averaging about 1010 per year

We can see the number of police shootings has been relatively stable (even though there was significant month to month variance) prior to 2020 with about 80 killings per month. In recent years it has increased to about 90

race state Asian Black Hispanic Native American Other White All All per 1000k White per 1000k Black per 1000k Asian per 1000k Native American per 1000k Hispanic per 1000k diff exp. White diff exp. Black diff exp. Asian diff exp. Native American diff exp. Hispanic
0 AK 2.0 3.0 0.0 12.0 0.0 31.0 48.0 7.620 7.356 12.211 5.205 12.871 0.000 2.258220 NaN NaN 11.743240 NaN
1 AL 0.0 43.0 4.0 0.0 0.0 91.0 138.0 3.328 3.149 3.884 0.000 0.000 2.353 0.829384 2.995424 NaN NaN NaN
2 AR 2.0 31.0 0.0 0.0 0.0 67.0 100.0 3.943 3.314 7.835 5.257 0.000 0.000 0.171429 7.219892 NaN NaN NaN
3 AZ 0.0 32.0 115.0 16.0 0.0 150.0 313.0 5.438 3.114 11.829 0.000 5.245 6.551 -1.437606 11.573414 NaN 4.956786 4.892410
4 CA 41.0 173.0 448.0 5.0 8.0 291.0 966.0 2.912 1.198 8.022 0.858 0.886 3.498 -0.933584 7.832720 0.438672 0.836496 2.373968
5 CO 5.0 25.0 67.0 5.0 0.0 142.0 244.0 5.328 3.536 12.131 3.522 6.824 6.901 -1.136656 11.891240 3.356832 6.738752 5.771464
6 CT 0.0 5.0 7.0 0.0 0.0 10.0 22.0 0.715 0.400 1.414 0.000 0.000 1.517 -0.180580 1.331775 NaN NaN 1.409750
7 Country 146.0 1971.0 1306.0 116.0 22.0 3735.0 7296.0 2.676 1.770 5.477 0.992 3.546 2.753 -0.301224 5.123768 0.847496 3.513888 2.287376
8 DC 0.0 22.0 0.0 0.0 0.0 2.0 24.0 4.260 0.814 7.969 0.000 0.000 0.000 NaN 5.881600 NaN NaN NaN
9 DE 0.0 7.0 1.0 0.0 0.0 7.0 15.0 1.875 1.236 3.941 0.000 0.000 1.405 -0.091500 3.524750 NaN NaN NaN
10 FL 2.0 164.0 69.0 0.0 4.0 248.0 487.0 2.863 1.874 5.739 0.420 0.000 1.683 -0.353414 5.258016 NaN NaN 0.993017
11 GA 3.0 128.0 19.0 0.0 0.0 126.0 276.0 3.197 2.350 4.707 0.914 0.000 2.366 0.364663 3.699945 NaN NaN 2.068679
12 HI 26.0 2.0 3.0 0.0 0.0 4.0 35.0 2.884 1.234 6.591 5.712 0.000 2.447 NaN NaN 4.630500 NaN NaN
13 IA 0.0 9.0 0.0 0.0 0.0 39.0 48.0 1.807 1.594 9.964 0.000 0.000 0.000 -0.070247 9.902562 NaN NaN NaN
14 ID 1.0 1.0 7.0 1.0 0.0 49.0 59.0 4.222 3.750 8.945 5.111 4.209 4.174 -0.197570 NaN NaN NaN 3.667360
15 IL 0.0 81.0 17.0 0.0 0.0 51.0 149.0 1.353 0.598 5.003 0.000 0.000 0.924 -0.450575 4.804109 NaN NaN 0.698049
16 IN 1.0 40.0 5.0 0.0 0.0 98.0 144.0 2.553 2.018 7.387 0.886 0.000 1.343 -0.180133 7.141912 NaN NaN 1.174502
17 KS 0.0 11.0 12.0 1.0 0.0 55.0 79.0 3.182 2.552 7.032 0.000 3.356 4.239 -0.209976 6.831534 NaN NaN 3.876252
18 KY 1.0 23.0 3.0 1.0 0.0 103.0 131.0 3.471 3.091 7.433 1.893 8.833 2.338 0.026107 7.148378 NaN NaN NaN
19 LA 2.0 90.0 5.0 0.0 0.0 57.0 154.0 3.874 2.261 6.965 2.795 0.000 2.620 -0.195116 5.705950 NaN NaN 2.434048
20 MA 1.0 12.0 9.0 0.0 1.0 28.0 51.0 0.884 0.588 2.507 0.275 0.000 1.445 -0.142184 2.433628 NaN NaN 1.349528
21 MD 1.0 68.0 5.0 0.0 0.0 38.0 112.0 2.192 1.237 4.392 0.306 0.000 1.052 -0.080392 3.727824 NaN NaN 0.848144
22 ME 0.0 1.0 1.0 0.0 0.0 30.0 32.0 2.814 2.777 6.281 0.000 0.000 5.862 0.103700 NaN NaN NaN NaN
23 MI 1.0 45.0 3.0 1.0 1.0 70.0 121.0 1.428 1.034 3.740 0.407 1.686 0.738 -0.106972 3.537224 NaN NaN NaN
24 MN 4.0 20.0 4.0 6.0 0.0 52.0 86.0 1.843 1.300 7.265 1.824 9.891 1.681 -0.279451 7.156263 NaN 9.867041 NaN
25 MO 1.0 71.0 6.0 0.0 0.0 116.0 194.0 3.742 2.679 11.605 1.015 0.000 2.893 -0.445570 11.163444 NaN NaN 2.743320
26 MS 1.0 41.0 1.0 0.0 0.0 50.0 93.0 3.633 3.271 4.271 3.906 0.000 1.302 1.102099 2.908625 NaN NaN NaN
27 MT 0.0 1.0 1.0 11.0 0.0 36.0 49.0 5.599 4.601 19.045 0.000 19.043 3.265 -0.404506 NaN NaN 18.673466 NaN
28 NC 4.0 77.0 11.0 1.0 0.0 133.0 226.0 2.658 2.188 4.098 1.742 0.735 1.437 0.287530 3.510582 NaN NaN 1.197780
29 ND 0.0 1.0 0.0 8.0 1.0 9.0 19.0 3.005 1.598 7.531 0.000 23.430 0.000 -1.079455 NaN NaN 23.267730 NaN
30 NE 0.0 6.0 4.0 1.0 0.0 30.0 41.0 2.549 2.086 7.611 0.000 4.440 2.438 -0.192806 7.486099 NaN NaN NaN
31 NH 0.0 0.0 0.0 0.0 0.0 20.0 20.0 1.763 1.875 0.000 0.000 0.000 0.000 0.217780 NaN NaN NaN NaN
32 NJ 3.0 41.0 9.0 0.0 0.0 26.0 79.0 1.034 0.466 3.625 0.418 0.000 0.610 -0.288820 3.471968 NaN NaN 0.410438
33 NM 2.0 5.0 97.0 5.0 1.0 35.0 145.0 8.131 2.370 11.215 6.597 2.696 11.403 -4.362468 11.011725 NaN 1.850376 7.524513
34 NV 2.0 22.0 37.0 1.0 0.0 57.0 119.0 4.902 3.081 9.959 0.993 2.575 5.483 -0.654324 9.512918 NaN NaN 4.120244
35 NY 1.0 61.0 14.0 1.0 0.0 56.0 133.0 0.788 0.471 2.053 0.070 0.592 0.446 -0.083752 1.914312 NaN NaN 0.299432
36 OH 2.0 89.0 2.0 0.0 2.0 128.0 223.0 2.249 1.556 7.125 1.009 0.000 0.576 -0.310670 6.841626 NaN NaN NaN
37 OK 3.0 41.0 14.0 14.0 0.0 135.0 207.0 6.243 5.421 16.058 4.308 4.691 4.308 0.732507 15.577289 NaN 4.129130 3.696186
38 OR 0.0 10.0 12.0 1.0 0.0 90.0 113.0 3.329 3.016 14.729 0.000 1.637 2.828 0.089809 14.662420 NaN NaN 2.411875
39 PA 2.0 57.0 9.0 0.0 0.0 84.0 152.0 1.390 0.927 4.494 0.554 0.000 1.247 -0.225310 4.332760 NaN NaN 1.155260
40 RI 0.0 2.0 1.0 0.0 0.0 2.0 5.0 0.554 0.260 2.879 0.000 0.000 0.792 NaN NaN NaN NaN NaN
41 SC 2.0 51.0 2.0 0.0 0.0 77.0 132.0 3.195 2.728 4.440 3.227 0.000 0.896 0.545815 3.551790 NaN NaN NaN
42 SD 1.0 0.0 0.0 7.0 0.0 13.0 21.0 2.879 2.079 0.000 10.545 10.782 0.000 -0.388303 NaN NaN 10.525769 NaN
43 TN 1.0 57.0 6.0 0.0 1.0 151.0 216.0 3.857 3.418 5.952 1.050 0.000 2.143 0.374827 5.292453 NaN NaN 1.950150
44 TX 8.0 162.0 215.0 1.0 2.0 272.0 660.0 2.863 1.475 5.623 0.771 0.434 2.417 -0.815400 5.265125 0.642165 NaN 1.311882
45 UT 2.0 8.0 17.0 2.0 0.0 56.0 85.0 3.378 2.435 24.456 3.312 5.299 5.004 -0.652492 24.412086 NaN NaN 4.547970
46 VA 1.0 58.0 5.0 1.0 1.0 80.0 146.0 2.051 1.594 4.135 0.223 2.809 0.789 0.148045 3.730953 NaN NaN 0.606461
47 VT 0.0 0.0 1.0 1.0 0.0 11.0 13.0 2.427 2.161 0.000 0.000 46.669 10.370 -0.144650 NaN NaN NaN NaN
48 WA 13.0 29.0 28.0 7.0 0.0 92.0 169.0 2.799 1.888 11.715 2.626 6.102 3.801 -0.370793 11.600241 2.396482 6.048819 3.459522
49 WI 3.0 33.0 8.0 4.0 0.0 70.0 118.0 2.397 1.619 10.156 2.344 7.387 2.500 -0.485566 9.997798 NaN NaN 2.344195
50 WV 1.0 10.0 0.0 0.0 0.0 53.0 64.0 4.045 3.575 17.558 7.901 0.000 0.000 -0.215165 17.412380 NaN NaN NaN
51 WY 0.0 2.0 2.0 2.0 0.0 14.0 20.0 4.004 3.024 25.027 0.000 14.830 4.086 -0.687708 NaN NaN NaN NaN

2.2. Racial Data

We can see that Native Americans and especially African Americans people are severely over-represented among police shooting victims. The police shooting rate for African Americans is two times higher than would be expected based on their population. On the other hands white and Asian people are much less likely to be shot.

We assume that that there are two explanations for this:

  • Socio-demographic factors (such as higher rates of poverty) means that African Americans are more likely to engage in violent crime.
  • African Americans are more likely to be victims of rocial prejudice and are more likely to be shot than White or Asian Americans while engaging in equivalent activities.

By looking at other variables and their interaction we'll try to answer which of these options is better at explaining the variance in police shootings between different racial groups.

2.3 Geographic Distribution of Police Shooting

8

Then map above compares the number police shootings per capita across different states. The drop down in the top right might be used to select different races. When a specific race is selected the map show by how much are different racial groups over/under represented compared to the general state population.

Looking at the state level data we can see that:

  • Generally north-eastern states have the lowest incidence of police shootings.
  • Shootings of black people (relative to population are most frequent in Midwestern and western states), the only state were black people are significantly over-represented in the east is West Virginia.

R² = 0.323

p-value = 0.000

The relationship is statistically significant.

This chart shows the relationship between the proportion of the African American population in a state and by how much are they over-represented as victims of police shootings:

  • The states to right have a larger black population (e.g. in Washington DC close to 50%n of all people are black).

  • The Y axis show the ratio between the black population in a state and the proportion of victims who are black. e.g. in the topmost state, Utah black people make up only 1.3% of the population but about 7.2% of all victims are black.

While their are over-represented in every state we can conclude from this chart that the higher proportion of black people live in a state the less likely are they to be shoot

This finding is quite interesting, while we can just conclusions but one possible explanation is that in areas where black people might make up a higher proportion of of law enforcement agencies they are less likely to be shot than in those where black populations are small.

This would indicate that racial prejudice might be one of the main reasons of this relationship. And that policies specifically targeting states with fewer black people might be beneficial.

R² = 0.200

p-value = 0.003

The relationship is statistically significant.

Interestingly if we look at Hispanic population the relationship is inverse. The more Hispanic people live in a state there more likely they are to be over-represented. This might indicate the an increase in police shootings might be community related, however this requires further investigation.

In any case the absence of the pattern we've seen with African American indicates that Hispanic people are less likely to be affected by racial prejudice even if in the areas where their population is relatively low.

2.4. Armed With and Threat Type

2.4.1 Threat Type By Race

In this section we'll try to examine the circumstances under the victims we killed and we'll check whether they vary significantly between different races.

Additionally we'll look into whether the relationship between race and the killed individual possessing a firearm or engaging in specific activities is statistically significant.

To do that we'll use the Chi-squared test which is better suited for analyzing association between categorical variables than the Z-Test

We can see that the reported threat type does indeed vary between races, for instance:

Hypothesis 1. Black people are more likely to be killed when after shooting or actively attacking someone than other groups
deadly_force  Other  Shoot/Attack
is_black                         
Black           953           957
Other          3054          2122

Chi2 value: 46.73
P-value: 0.00000
Degrees of freedom: 1.00
Reject null hypothesis: There is a relationship between being black and  being killed while shooting at or actively attack someone.
*alpha = 0.01

It seems that black people are significantly more likely to be shot while shooting a firearm. This is likely an argument against the racial prejudice hypothesis we've expressed previously since shooting is generally the most serious threat and the one most likely to be met with deadly force regardless of the shooter's race.

Hypothesis 4. Killings of black are more likely to have no determined reason
is_undetermined  Other  Undetermined
is_black                            
Black             1836            74
Other             5019           157

Chi2 value: 2.87
P-value: 0.09031
Degrees of freedom: 1.00
Fail to reject null hypothesis: There is no relationship between undetermined threat type and victim being black.
*alpha = 0.05

In the main chart we've noticed that there are slighly more incidents where black people were shot but there was no reported threat type compared to other races. However this does not appear to be significant, but it might be worth examining it further as it likely indicates that these people might have been shot without sufficient cause.

Hypothesis 2. Hispanic and other people are more likely to be killed when not armed with a firearm than black or white people
has_gun                Gun  Other
is_hispanic_or_other             
Black/White           3470   2074
Hispanic/Other         815    727

Chi2 value: 47.44
P-value: 0.00000
Degrees of freedom: 1.00
Reject null hypothesis: There is a relationship between being hispanic and not being armed with a firearm when killed.
*alpha = 0.01

Another interesting observation is that hispanic people are less likely to be firing a gun when shot. This also something which should be investigated further.

2.4.2 Armed With By Race
Hypothesis 6. Non-white people are significantly more likely to be killed when unarmed
was_unarmed  Other  Unarmed
is_white                   
Other         3178      274
White         3434      200

Chi2 value: 16.41
P-value: 0.00005
Degrees of freedom: 1.00
Reject null hypothesis: There is a relationship between the killed individual being unarmed and non-white.
*alpha = 0.01

This suggests that police officers are less likely to shot white people unless they are carrying some sort of a weapon. This is a fairly clear indication of racial prejudice and possibly means that policemen might put in more effort in defusing the situation before using deadly force if white people are involved.

However since the number of incidence is fairly low we can't say this occurs in a significant proportion of police shootings.

The fact that Hispanic people are more likely to be shot while not firing a gun seems to be mostly explained by them being more likely to have knives.

2.5 Flee Status

It seems that Hispanic and black people are more likely to be shot when fleeing than other people. Without additional data we can't really speculate on the reasons of this too much.

Either police officers are more likely to use deadly force when the fleeing individual is white or white people are less likely to flee (.e.g. significantly more shootings of white people seem to be mental illness related and such individual are possibly less likely to attempt to run away).

was_mental_illness_related  False  True 
is_white                                
Other                        2890    562
White                        2642    992

Chi2 value: 124.87
P-value: 0.00000
Degrees of freedom: 1.00
Reject null hypothesis: There is a relationship between being white and the killing being mental health related.
*alpha = 0.01

Without additional data this is relatively hard to interpret. More straightforward explanation would be that white people are less likely to be killed while committed intentional/deliberate property related or other violent crimes. This might be developed further using additional crime statistics datasets (if available)

R² = 0.230

p-value = 0.000

The relationship is statistically significant.

The proportion of shootings that were related to mental health issues seems to vary wildly between different states. The situation in some of the North-Eastern states seems to be the worst.

However we shouldn't just jump to conclusions. If we look at the relationship between the total number of police shootings and what proportion of them were related we can see that while the correlation is not very strong the proportion of of shootings related to mental health increases as the overall number of shootings in a state decreases.

The most intuitive explanation of that is that mental health related shootings in states like New York (which have very low number of shootings overall) are not necessarily a bigger issue than in other states but that people in other states are much more likely to be shot for different reasons.

However we can still compare states with overall comparable numbers of police shootings, for example:

  • Maine, Washington, Vermont and New Hampshire have comparable rates of police shootings, but close to 50% of all shootings in Maine are mental health related while only 15% in Maine are (with the other states in between). Barring reporting issues and considering how similar in most aspects these states are (so it's unlikely the overall prevalence of mental health issues might be very different) government in New Hampshire (and other states in comparable situations) should consider adopting the mental health related policies which are used in states like Maine.

** Signficant variance between the proportion of mental health related police killings in states with

2.7 Age Analysis

Victim who are white generally tend to be older (median: 39 years) than those who are black (31) or are Hispanic and other (34). However this mostly corresponds to the the difference in median age in the overall population:

  • White 44 years
  • Black 35 years
  • Hispanic 30 years

We can also see that shooting victims tend to be a bit younger than the rest of the population.

There seems to be no meaningful difference in age between genders.

Looking at the threat type we can see that older people are much more likely to be shot while point a weapon while younger people are more often shot while physically attacking someone which seems to be fairly self explanatory.

2.9 Body Camera Usage

Intuitively we could expect that increasing usgae of body cameras would've resultedin a decrease of police shootings, however this has not been the case.There are several possible explanations for that:

  • Body cameras are being rolled out at a too slow pace. Any effect they might have hadhas been overshadowed by increasing police killings (possible related to the covid pandemic
  • There is no meaningful relationship because camera usage and shootings or it's very low

We can't measure this relationship statistically without additional data such as a dataset of all police encounters and their outcomes which is obviously unobtainable.There possibly might be other approaches that could be used to estimate the effect bodycameras have which might be worth investigating

Hypothesis 5. White people are significantly less likely to be killed when the officer is wearing a body camera
body_camera  False  True 
is_white                 
Other         2805    647
White         3195    439

Chi2 value: 60.04
P-value: 0.00000
Degrees of freedom: 1.00
Reject null hypothesis: There is a relationship between the killed individual being white and the officer wearing a body camera.
*alpha = 0.01

1.2.10 Correlations

Variables pairs with Spearman corr. bellow -0.4 or above 0.4:

var I var II coef
0 flee_status_foot flee_status_not -0.42
1 armed_with_gun threat_type_shoot 0.51
2 age age_bracket_short_45+ 0.75
3 flee_status_car flee_status_not -0.47
4 race_Black race_White -0.47
5 armed_with_undetermined threat_type_undetermined 0.49
6 armed_with_gun armed_with_knife -0.53
7 gender_female gender_male -0.94
8 armed_with_gun threat_type_attack -0.43
9 age age_bracket_short_25 or younger -0.68

1.3 Analysis

After examining the dataset and performing some basic hypothesis testing we've found that there are some significant differences between the characteristics of victims depending on their race and age:

  • Black people are more likely to be killed when after shooting or actively attacking someone than other group
  • Hispanic and other people are more likely to be killed when not armed with a firearm than black or white people
  • Killings of white individuals are more likely to be related to mental health issues
  • Killings of black people are more likely to have no determined reason than for other groups
  • White people are significantly less likely to be killed when the officer is wearing a body camera
  • Non-white people are significantly more likely to be killed when unarmed
  • People who are 45 or older are more likely to be killed while pointing a firearm

However we can't explain whether these relationships exist due to some underlying reason (e.g. systemic discrimination or biases of the law enforcement agents, socioeconomic differences between racial groups etc.) without additional data. This is something that needs to be investigated further.

However even if we were able to provide a more reliable explanation for these relationships that does not mean that we will be able to derive actionable decisions for the United States Department of Justice. Solving them might require an enactment of complex socioeconomic policies which is not something the state department is in control.

Actionable Decisions

One other important aspect that we must take into account is that while all police shootings are regrettable the majority of them are justifiable in the sense that the victim was shot while committing a violent crime and threatening the life and safety of other individuals and/or police officers.

While it's possibly that the prior training etc. of police officers to handle such situations using less lethal methods can possibly decrease the number of deaths this is not something wen can analyze using the data we have.

Instead we'll focus on demographic, social, economic and other macro factors which can be used to explain the varying levels of police shootings between different states to: 1. Determine the factors which explain the variance in police shootings. 2. Find factors which can be influenced by Federal and local governments.

This might allow police departments in different states to adopt policies, training standards etc. from other jurisdictions which is potentially a relatively straightforward way to decrease the incidence of police shootings.

3.1 Explaining Differences Between States:

One possible approach could be to try and find demographically similar US states which have significantly different numbers of shootings per capita. If such states exist we can try to find whether this can be explained by some other variable or attribute which could be theoretically influenced by local or state governments.

3.1 Homocide Rates

We would expect the the number of police shootings would be more or less proportional to the levels of violent crime in any given state:

R² = 0.074

p-value = 0.056

The relationship is not statistically significant.

Interestingly we can see that the correlation between homocides and police shootings is very low and only a small proportion of variability in police shotings between different states can be explained:

1.3.1.2 Police Spending Per Capita

Next let look at the spending on law enforcment per capita (adjusted by per capita income in state):

R² = 0.101

p-value = 0.022

The relationship is statistically significant.

While relationship between high spending on law enforcement and thge number of police shotings is relatively low, suprisingly it's positively correlated and statistically significant. The more a state spends on police the more people end up being shot. We shouldn't just took conclusions based on this alone, though. It's possible that there are other factors at play:

  • Threre is more crime in poorer states requiring more resources for law enforcement (however we have already partially disproven this by looking at the homocide rate)

type of homocides (i.e. high levels of drug or organized crime related crime probably require more resources to police than high levels of domestic homocides

  • Other sociodemographic variables which are possibly correlated with police spending (e.g. population density) offer a stronger explanation
  • allocation of spending (i.e. in some states police officers might be expected to provide services which might be provided by other organization in other states

etc.

1.3.1.1 Clustering States by Demographics

Hierarchical clustering is a method of cluster analysis that builds a hierarchy ofclusters by minimizing the variance of thedistances between the clusters being merged.

The states that end up on the same branch are most similar to each other based on these factors:

  • Persons 65 years and over, percent
  • White alone, percent
  • Black or African American alone, percent
  • Hispanic or Latino, percent
  • Foreign born persons, percent
  • Language other than English spoken at home, pct age 5+
  • High school graduate or higher, percent of persons age 25+
  • Bachelor's degree or higher, percent of persons age 25+
  • Homeownership rate
  • Housing units in multi-unit structures, percent
  • Median household income
  • Persons below poverty level, percent
  • Population per square mile, 2010
  • police_prop_income
  • Homocide per 1000k

We need to find a threshold at which we'll have a reasonable number of clusters. In addition to relying on the Dendogram we can also measure ther Silhouette Score:

Considering we want to avoid a to high number of cluster a threshold of 9 seems like a good option

Total Clusters: 5

Explained variance: [0.37028356 0.24933625]

1.3 Analysis

Let's build a statistical model to try and determine which of the demographic and other variables are best at explaining the variance in police shootings between different states.

We can't use Random Forest due to the low number of observations which would likely result in overfitting.

Multiple linear regression is also possibly not the best option due to the higher number of dependent variables in relation to the number of observations.

Let's look at the correlation between dependent variables before we chose a model:

1.3.1 Correlation and Preparing the Dataset
1.3.2 Elastic Net Linear Regression Model

Considering that there is strong correlation between many of the variables we'll use the Elastic Net model instead of Lasso Regression for instance

Persons 65 years and over, percent                           -0.131234
White alone, percent                                          0.000000
Foreign born persons, percent                                -0.000000
High school graduate or higher, percent of persons age 25+    0.000000
Bachelor's degree or higher, percent of persons age 25+      -0.000000
Homeownership rate                                           -0.220462
Housing units in multi-unit structures, percent              -0.739171
Median household income                                       0.000000
Persons below poverty level, percent                          0.064068
Population per square mile, 2010                             -0.422243
police_prop_income                                            0.466692
Homocide per 1000k                                            0.000000
home_price_to_income                                          0.077081
dtype: float64
1.3.3 Interpreting Model Results

Any conclusions we make based on these results are obviously should be taken with a grain of salt however they do show some possibly surprising finding:

  • Racial diversity/proportion of non-white population has no influence on the number of shootings per capita.

  • However population density and concentration seem to be important factors. Specifically the proportion of people living in multi-unit housing units (apartments) seems to be the strongest predictor. There are likely several non straigforward interpretations of this however in combination with population density this might imply that:

    • police officers tend to behave different depending how likely other people and bystanders in general are to witness their actions.
    • Also it's possible that they feel less safe in less densely populated areas because it might take longer for other officers to reach them.
    • People shot by police are more likely to die if it occurs in areas with poor coverage by emergency services and it takes a long time for them to arrive.

    We can't test the validity of any of these hypothesis but it might be worth examining them further because they all seem to be highly actionable (improving police training, strategies for acting around dangerous individuals like waiting for backup etc.)

  • Homicide rate seems to have no effect on the number of police shooting while the amount spent per capita on policing in the state is a relatively strong predictor.

    • this implies that there is not link between the general level of extreme violence in the state and the number of police shootings. This is highly concerned since using deadly force is only justifiable when the life of the officer or somebody else is in danger. However there seems to be no relationship between actual likelihood of a life threatening event occurring the decision by a law enforcement officer to use deadly force.

      This is something that certainly should be investigated further and is also possibly highly actionable. Especially because certain states handle this much better (like New York) and their practices might be applied in states which handle it much worse like New Mexico.

  • High police spending seems to have a moderate effect on the incidence of police shooting combined with the homicide statistics this is also highly concerning. Increased spending on police, in this case at least, seems to produce a more negative outcome. It's hard to determine why this might be the case. However it's possible that significant proportions of funding might be miss-allocated (e.g. spent on unnecessary equipment etc.) and might better used to improve training. Even barring that it might mean that a smaller police presence might decrease the number of police shooting while have no effect on the murder rate (it's important to note that other crime statistics are not taken into account here).

Conclusions

  • The difference in numbers can't be fully explained by "immutable" socio-demographic variables:

    • Black people are signficantly more likely to be shoot in states where they only make up a small proportion of the population:

      e.g. in Southern States the rate of police shootings of black people is much lower than in some norther and mid-western states. This might indicate that white police officers in those areas might target black people more often than in areas were African Americans are better integrated in local police departments etc.

      This phenomena can't be observed with other racial groups like Hispanics or Native Americans

    • There seems to be almost no relationship between the murder rate and the incidence of police shootings in different states.

      This is a crucial finding because it implies that a higher levels of police shootings can't be explained by higher prevalence of violent crime. This is something that certainly needs to be further looked into by the Justice department. One option would be to try to encourage then adoption of practices used in more successful states across the country.

    • Higher police spending (as proportion of state per capita income) seems to result in a higher number of police shootings. Taken together with the what we found about violent crime this is also highly concerning. Based on our modeling lower police spending has no effect on the homocide rate, this implies that overpolicing might be an issue (especially considering the recent trends in adoption of milatary grade weaponry and equipment across different police deparments).

      It's possible that decreasing the spending on law enforcement might decrease the number of police shootings while having no effect on the homocide rate however we have not looked into its relationship with the overall levels of crime.

  • We can make no determinate conclusions on body camera usage:

    • While the proportion of police officers wearing a camera during shootings has increased over the last 8 years the number of shootings has actually increased. Without additional data we can't determine whether their adoption had limited effect or whether it's statistically not noticeable because the over number has increased due to other factors.

Limitations

  • The core of the analysis is based on analyzing US states. This mean that the number of samples is quite and low and might be to low to for some mode. It might be worth going down a level or so and using Combined statistical area instead (collections of countries based on interconnected, ussually urban areas).

  • It would be a good idea to look at more variables like the number of police interactions and the liklyhood of them ending in a police shooting based on the target socio-economic status, race, mental state etc. and other factors like whether the officer is wearing a body camera, their training level etc. Of course such datasets are probably unobtanable without significant resources.